Using Noun Phrase Heads to Extract Document Keyphrases
نویسندگان
چکیده
Automatically extracting keyphrases from documents is a task with many applications in information retrieval and natural language processing. Document retrieval can be biased towards documents containing relevant keyphrases; documents can be classified or categorized based on their keyphrases; automatic text summarization may extract sentences with high keyphrase scores. This paper describes a simple system for choosing noun phrases from a document as keyphrases. A noun phrase is chosen based on its length, its frequency and the frequency of its head noun. Noun phrases are extracted from a text using a base noun phrase skimmer and an off-the-shelf online dictionary. Experiments involving human judges reveal several interesting results: the simple noun phrase-based system performs roughly as well as a state-of-the-art, corpus-trained keyphrase extractor; ratings for individual keyphrases do not necessarily correlate with ratings for sets of keyphrases for a document; agreement among unbiased judges on the keyphrase rating task is poor.
منابع مشابه
Finding nuggets in documents: A machine learning approach
However, many text mining applications do not have adequate natural language processing ability beyond simple keyword indexing, and as a result, there are too many textual elements (words) included in the analysis. We argue that noun phrases as textual elements are better suited for text mining and could provide more discriminating power, than single words. Discourse representation theory (Kamp...
متن کاملCross-language Entity Linking Adapting to User’s Language Ability
In this paper, we propose a method to automatically discover valuable keyphrases in Japanese and link these keyphrases to related Chinese Wikipedia pages. The method that we propose has four stages. Firstly, we extract nouns from a Japanese document using a morphological analyzer and extract the candidates of keyphrases using a method called Top Consecutive Nouns Cohesion (TCNC) [1]. Then, we j...
متن کاملAccurate Keyphrase Extraction from Scientific Papers by Mining Linguistic Information
In this paper we investigate the impact of candidate terms filtering using linguistic information on the accuracy of automatic keyphrase extraction from scientific papers. According to linguistic knowledge, the noun phrases are most likely to be keyphrases. However the definition of a noun phrase can vary from a system to another. We have identified five POS tag sequence definitions of a noun p...
متن کاملInvestigating Embedded Question Reuse in Question Answering
The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...
متن کاملАвтоматическое Извлечение Ключевых Фраз На Основе Вычислительной Геометрии Для Решения Проблем Контекстного Поиска an Automatic Computational Geometric Key-phrase Extraction for Resolving the Contextual Retrieval Problem
Keyphrases are useful in a number of tasks such as summarizing, indexing, labelling and searching. The purpose of automatic keyphrase extraction is to select keyphrases from within the text of a given document. However, automatic key-phrase extraction it is possible to generate keyphrases for the vast number of documents that do not have manually assigned key-phrases of previous key-phrase extr...
متن کامل